Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction

نویسندگان

  • Gustavo Hernández Ábrego
  • Lex Olorenshaw
  • Raquel Tato
  • Thomas Schaaf
چکیده

In this paper we present a procedure to refine the recognition dictionary based on a composite approach to prune the unneeded pronunciations. First, pruning is applied in a non-uniform manner according to the characteristics of each word. Even though this straightforward operation may produce high-quality dictionaries, it makes the refined dictionary heavily dependent on the data used in this process. For the words not observed in the data, we propose, in second place, to use multiple sequence alignment techniques in order to find phonetic consensus among the pronunciation variants and select the worthy pronunciations that will represent the unobserved words. Experimental results show that our dictionary refining method helps to improve the recognition performance in two relevant aspects: it increases the recognition accuracy by reducing the cross-word confusibility and it improves the recognition speed by reducing the complexity of the search space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wiktionary as a source for automatic pronunciation extraction

In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary [1] since it is at hand in multiple languages and, in addition to the definitions of the words, many...

متن کامل

Inferring Hierarchical Pronunciation Rules from a Phonetic Dictionary

This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the train...

متن کامل

Large vocabulary continuous speech recognition based on cross-morpheme phonetic information

In this paper, we present a novel method to regulate lexical connections among morpheme-based pronunciation lexicons for Korean large vocabulary continuous speech recognition (LVCSR) systems. A pronunciation dictionary plays an important role in subword-based LVCSR in that pronunciation variations such as coarticulation will deteriorate the performance of an LVCSR system if it is not well accou...

متن کامل

HMM-based Pronunciation Dictionary Generation

In this paper, we discuss automatically generating a phonetic pronunciation from an orthographic spelling of words. The letter-sequence to phoneme-sequence mapping is useful in a variety of contexts, including text-to-speech applications, automatic spelling correction, and generating a pronunciation lexicon for a new training dataset which contains out-of-vocabulary words. A system based on hid...

متن کامل

The Effect of Using Phonetic Websites on Iranian EFL Learners’ Word Level Pronunciation

Computer-assisted language learning (CALL) is reaching an up most position in the pedagogical field of English as a Second or Foreign Language (ESL/EFL). The present study was carried out to study the effect of using phonetic websites on Iranian EFL students’ pronunciation and knowledge of phonemic symbols. Participants of the study included 30 EFL female pre-intermediate students studyin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004